The Tradeoffs Between Open and Traditional Relation Extraction
نویسندگان
چکیده
Traditional Information Extraction (IE) takes a relation name and hand-tagged examples of that relation as input. Open IE is a relationindependent extraction paradigm that is tailored to massive and heterogeneous corpora such as theWeb. An Open IE system extracts a diverse set of relational tuples from text without any relation-specific input. How is Open IE possible? We analyze a sample of English sentences to demonstrate that numerous relationships are expressed using a compact set of relation-independent lexico-syntactic patterns, which can be learned by an Open IE system. What are the tradeoffs between Open IE and traditional IE? We consider this question in the context of two tasks. First, when the number of relations is massive, and the relations themselves are not pre-specified, we argue that Open IE is necessary. We then present a new model for Open IE called O-CRF and show that it achieves increased precision and nearly double the recall than the model employed by TEXTRUNNER, the previous stateof-the-art Open IE system. Second, when the number of target relations is small, and their names are known in advance, we show that O-CRF is able to match the precision of a traditional extraction system, though at substantially lower recall. Finally, we show how to combine the two types of systems into a hybrid that achieves higher precision than a traditional extractor, with comparable recall.
منابع مشابه
A New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model
Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...
متن کاملOpen Information Extraction with Tree Kernels
Traditional relation extraction seeks to identify pre-specified semantic relations within natural language text, while open Information Extraction (Open IE) takes a more general approach, and looks for a variety of relations without restriction to a fixed relation set. With this generalization comes the question, what is a relation? For example, should the more general task be restricted to rel...
متن کاملThe Effectiveness of Traditional and Open Relation Extraction for the Slot Filling Task at TAC 2011
Our goal in this paper is to investigate the effectiveness of relation extraction techniques for the slot-filling task. We discuss two relation extraction systems. YRES follows the traditional paradigm in relation extraction, where a system takes advantage of available examples for each relation to be extracted. On the other hand, SONEX follows the open relation extraction paradigm, where the r...
متن کاملنقش جغرافیا در شکلگیری انواع حیاط در خانههای سنتی ایران
Extended Abstract Introduction The formation of built spaces in composition to open spaces is one of the important subjects in designing architectural spaces. Different factors were important in formation of open spaces in traditional architecture. The kind of function of a building was one of these factors, because the quality of designing courtyards and its elements were dependent to the ...
متن کاملORE extraction and blending optimization model in poly- metallic open PIT mines by chance constrained one-sided goal programming
Determination a sequence of extracting ore is one of the most important problems in mine annual production scheduling. Production scheduling affects mining performance especially in a poly-metallic open pit mine with considering the imposed operational and physical constraints mandated by high levels of reliability in relation to the obtained actual results. One of the important operational con...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008